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Abstract 

Self-exciting processes of Hawkes type have been used to model various phe- 
nomena including earthquakes, neural activities, and views of online videos. 
Studies of temporal networks have revealed that sequences of social intcrevent 
intervals for individuals arc highly bursty. We examine some basic properties 
of event sequences generated by the Hawkes self-exciting process to show that 
it generates bursty interevent intervals for a wide parameter range. Then, we 
fit the model to the data of conversation sequences recorded in company of- 
fices in Japan. In this way, we can estimate relative magnitudes of the self 
excitement, its temporal decay, and the base event rate independent of the 
self excitation. These variables highly depend on individuals. We also point 
out that the Hawkes model has an important limitation that the correla- 
tion in the interevent intervals and the burstiness cannot be independently 
modulated. 
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1 Introduction 

Social networks, which specify the pairs of individuals that are directly con- 
nected and those that are not, are substrates of social interactions. Properties 
of social networks, both online and offline, have been clarified using various 
techniques in different research fields, particularly in recent years. An impor- 
tant caveat in the use of social networks for understanding social behavior is 
that the pair of directly connected individuals does not interact all the time. 
Social events between a pair of individuals, such as dialogues and transmis- 
sion of email, are better described as a sequence of events, i.e., a collection 
of tagged event times, where the tag includes, for example, the identity of 
the two individuals, type of the event, duration, and content of dialogues. In 
fact, recent massive data, mostly online, and technological developments of 
recording devices of offline social interaction enable recording of social events 
with a higher temporal (and spatial) precision than before. Examples of data 
taken in this domain include calling activity [2], web recommendation writ- 
ing [16], email traffic [lj[2[26], online forum dealing with sexual escorts [36] . 
human interactions in the real space [3J[T71[IH1SI3 , to name a few. Transmis- 
sion of infection or information may occur only during the period in which 
two individuals are involved in an event. A set of such event sequences among 
pairs of individuals are collectively called the temporal network [TS] , which 
is the focus of this volume. 

A remarkable finding derived from the analysis of event sequences is that 
the interevent interval (IEI) is distributed according to a long-tailed distri- 
bution in many cases. The survivor functions (also called the complementary 
cumulative distributions) of IEI (i.e., the probability that the IEI is larger 
than a given value t), are shown in Fig. [T] for the conversation sequences 
of two individuals in different data sets D\ and Di used in our previous 
study [5T] (see Sec. 13.11 for descriptions of the data). Figure Q] indicates that 
the distributions are long-tailed. This evidence opposes to standard models 
of social dynamics such as epidemics and opinion formation on social net- 
works studied on classical and complex networks. An almost universal and 
implicit assumption underlying these models of dynamics on networks has 
been that the IEI is distributed according the independent exponential dis- 
tribution such that the event sequence is a realization of a Poisson process. 
Recent studies addressed to the effects of long-tailed IEI distributions on col- 
lective dynamics such as epidemic spreading 16,18,21,37,39,44] and opinion 
formation (HUSHED]. 

Different mechanisms seem to explain the non-Poissonian behavior of the 
IEI. A first mechanism that was discovered to generate power-law IEI dis- 
tributions is a priority queue model pQ. In this class of models, each task 
corresponding to an event carries a priority level and arrives at a queue. 
Then, the queue tends to execute tasks with high priority; tasks with low 
priority are made to wait for a long time before being executed. Although a 
single priority queue may not represent social interaction such as conversa- 
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tion events, the priority queue has been extended to allow for interaction of 
two priority queues between a pair of interacting individuals [T§1 1271 1521 HT] . 
However, some types of social interaction including conversations may not 
proceed like a queue. Therefore, we attempt an alternative approach in the 
present chapter. 

Another facet of actual event sequences is that they often possess positive 
temporal correlation. In other words, a long (short) IEI is likely to be fol- 
lowed by a long (short) IEI. This is the case even if the effect of circadian 
fluctuations is removed from data [50]. Although there are various methods 
to measure temporal correlation of the IEI [20], here we show it by simply 
measuring 

r^(r)^(r. l+1 ) T ^ T , (1) 

where Tj is the ith IEI in a sequence, and (•} represents the average. The 
values of T ncxt are plotted against r in Figs. EJa), H{b), and [2Jc) for the 
conversation sequences used in Fig. [TJ times of email sending and receiving 
in a university |6j , and times of online sexual escorts by male individuals [36] , 
respectively. We remark that long-tailed IEI distributions are known for the 
email [U|43] and sexual escort [36] data sets. The conditional mean IEI r noxt 
increases with r in Figs.[2Ja) and[2Jb). Therefore, adjacent IEIs are positively 
correlated. In Fig.[2]Jc), r next decreases with r for r < 7 and increases with r 
for r > 7. Figure (2jc) suggests that those who have bought an escort tend to 
avoid buying a next escort within a week. This is directly shown in Fig. G^d), 
which shows the IEI distribution. However, adjacent IEIs for the sexual escort 
data are positively correlated on a longer time scale (Fig. djc)). 

Some models generate event sequences that possess positive IEI correla- 
tion, although this point is not necessarily mentioned in the literature. For 
example, in the discrete time model proposed in [10], the probability of an 
event decreases if events occurred too frequently in the recent past and in- 
creases if the time since the last event becomes long. Such a mechanism may 
generate positive IEI correlation. 

An alternative mechanism that yields positive IEI correlation is self- 
excitation. The idea is that once an individual talks with somebody, the 
individual is excited to talk with somebody with a higher rate. Malmgrcn 
and coworkers developed such models and applied to data of email and letter 
correspondence In the cascading nonhomogeneous Poisson process 

proposed in the authors assumed that the primary process is an inho- 
mogeneous Poisson process with a periodic event rate. An event generated 
from the primary process is assumed to elevate the system to the active state 
and trigger cascades of activity. In other words, after a trigger event, a burst 
of events may ensue as a result of the Poisson process with a rate that is 
larger than the base rate of the primary process. The entire recording period 
is divided into alternately appearing intervals of the excited state with a high 
event rate and the normal state with a low event rate by an adjustment of the 
position and number of intervals to yield a good fit to the data. As a result, 



4 



Naoki Masuda and Taro Takaguchi and Nobuo Sato and Kazuo Yano 



the number of events contained in a burst is shown to obey an approximate 
exponential distribution (also see [3D], which shows that the number of events 
in a burst obeys a power law distribution; the definition of burst is different in 
the two papers). With a circadian and weekly rate modulation, the cascading 
nonhomogeneous Poisson process is capable of producing the long-tailed IEI 
distributions observed in the data. 

Their model is complex in the sense that many parameters have to be 
estimated. By simulated annealing, they determined the best assignment of 
active intervals. This is common to their another model proposed in |25j . 
In [25], letter writing activity of each renowned individual is fitted by a 
cascading Poisson process model. The time unit is set to a day. The two 
parameters, i.e., the base event rate and tendency to write an additional 
letter within a time unit, are estimated on the basis of the data. Because the 
different parameter values are assumed for different sections of the data, the 
number of the parameters in the model can be large. In the case of the letter 
correspondence by Einstein, data are collected over 54 years, and the two 
parameters are estimated for each year. Therefore, there are 108 parameters. 

These models (25][26] are quite successful in capturing properties of the 
real event sequences. Nevertheless, it may be also fruitful to consider a much 
simpler model as a complementary approach to capture the origins of bursts 
and IEI correlation inherent in human behavior. 

A simple two state model in which normal and excited states are assumed 
is proposed in [20]. The model is not a hidden Markov model because the 
probability of staying in the excited state becomes large as the number of 
events that have already occurred in the current burst increases. The model 
with proper parameter values reproduces properties of the original data such 
as the power-law IEI distribution and autocorrelation function. 

Statistical methods to estimate the model parameters from the data were 
not presented for the model proposed in [20] . In this contribution, we fit the 
point process model called the Hawkes process [TOfLl] to the data recorded 
in company offices [4"Ti|4l)Il4"8"] (also see Fig. Q] and Fig. (Ha)). A main benefit 
for using the Hawkes process is that it contains a small number of parame- 
ters and mathematically tractable; the maximum likelihood (ML) method is 
established for some important special cases [33] . In Sec. [2] we recapitulate 
the Hawkes process and numerically investigate properties of event sequences 
generated by the Hawkes process. Then, we carry out the ML estimation of 
the parameters and compare the data and the estimated model in Sec. [3] 
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2 Hawkes process 

2.1 Definition and basic properties 

The Hawkes process [HUT3] l45] is a self-exciting point process model that is 
analytically tractable. It is an inhomogeneous Poisson process in which the 
instantaneous event rate depends on the history of the time series of events. 
It is not a renewal process. The event rate at time t, denoted by A(£) is given 
by 

A(t) = i/ + W-U), (2) 

i,ti<t 

where U is the time of the ith event, and 4>{t) is the memory kernel, i.e., the 
additional rate incurred by an event. The causality implies <f>(t) =0 (t < 0). 

The Hawkes process has been used for modeling, for example, seismological 
data [301 S3, video viewing activities [HHB], neural spike trains [33], and 
genomic data [35J . For example, in [3] , time series of views of different videos 
on YouTube were categorized into three classes, which were characterized by 
different </>(i) and different time-dependent versions of v. The Hawkes process 
has also been used to construct a method to estimate the structure of neural 
networks from given spike trains [SJ , analyze auto and cross correlation in data 
recorded from mouse retina [24] , and understand the correlation between the 
activities of different neurons in pulse-coupled model networks of excitatory 
and inhibitory neurons [33]. In [35], the Hawkes process is used to model 
stochastic occurrences of specific genes on DNA sequences. The method to 
estimate a piecewise linear 4>{t) based on the least square error was presented. 

Depending on applications, the memory kernel <p(t) has been assumed to 
be a hyperbolic (i.e., power law) function [4] or a superposition of the gamma 
function [3D]. Nevertheless, in the present work, we simply set 

4>(t) = ae~ f \t > 0) (3) 

for the following reasons. First, it allows the ML estimation of the parameters 
a, P, and v [33]. Second, the Hawkes process with Eq. ([3]) has a small number 
of parameters as compared to competitive models with self excitation [25 , 26 , 
[301I3T]. It should be noted that Eq. ([3J indicates that the self-exciting effect 
of an event decays in time. It is contrasted with a previous model in which 
the self-exciting effect is constant for some time and the event rate returns 
to the basal rate [26]. An example time course of the event rate X(t) and the 
corresponding event sequence given Eq. ([3]) are shown in Fig. [3J 

We define cluster of events as the set of events that are triggered by a 
single event occurring at the basal rate v. In other words, all the events in 
a cluster are descendants of the trigger event. The expected cluster size is 
given by [12l|45] 
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c= / (j)(t)dt 



1 



(4) 




and the stationary event rate is given by 



A = cv 



v 



(5) 



1 — - 



The convergence of the event rate requires a < j3. 



2.2 Statistics of IEI 



In this section, we numerically examine basic properties of the Hawkes process 
with the exponential memory kernel. To quantify the broadness of the IEI 
distribution, we measure the coefficient of variation (CV), defined as the 
standard deviation of the IEI divided by the mean of the IEI as follows: 



where N is the number of IEIs in a given sequence and (r) = $^»=i Ti/N is 
the mean IEI. It should be noted that (r) in the limit N — > oo is equal to 
1/A = (I — a/f3)/v. The Poisson process yields CV = 1. 

We also measure the correlation coefficient for the IEI defined as 



The Hawkes process is invariant under the following rcscaling of the time 
and parameter values: Ct =t',a = Ca', (3 = Cj3' ', and v = CV', where C > 
is a constant. Therefore, we normalize the time by setting v = 1 and vary a 
and p. The values of CV, IEI correlation, and mean cluster size c are invariant 
under this rescaling. For a given pair of a and /? values, we generate a time 
series with 2 x I0 5 events using the method described in [5^] and calculate 
the statistics of the IEI. 

The values of CV, IEI correlation, and c (Eq. (|4])) for various a and /3 
values are shown in Fig.[4ja), Fig.UJb), and Fig. life), respectively. Although 
we can theoretically calculate CV using the expression of the IEI distribution 
|I3j (also see Appendix I for details), it is numerically demanding to do so. 
Therefore, we resorted to direct numerical simulations. The data are present 
only in the region a < /3, where the Hawkes process does not explode. 

Figure SJa) indicates that the Hawkes process generates a wide range of 
CV. A large value of a//3(< 1) yields a large CV value. This is the case for 
both small and large a values. In Fig. [BJ the survival function of the IEI on 



CV = 




(6) 



EjlT 1 ^ - - (t))/(N - 1) 



(7) 



Self-exciting point process modeling 



7 



the basis of 2 x 10 5 events is compared for different a and (3 values that satisfy 
P = 1.1a or 1.2a. Although the CV values are large, the IEI distributions 
are consistently different from power law distributions. In particular, the IEI 
distribution seems to be a superposition of two distributions with different 
time scales when a is large (Fig. EJc)). It should be noted that we assumed 
the exponential, not long-tailed, memory kernel (Eq. ((3|)). 

Figure BJb) indicates that a large a//3 value also yields a large IEI corre- 
lation. Once the event rate increases because of recent occurrences of other 
events, the following IEI tends to be small. Therefore, strong self-excitation 
in the model (i.e., large a/ (3) is considered to cause large IEI correlation. The 
strength of self-excitation can be also quantified by c. Figure Utc) indicates 
that a large a/ f3 tends to yield a large c. 

Figures 0|a), H)(b), and|3)(c) look similar. To be quantitative, we calculate 
the correlation coefficient between the three quantities. We calculate the CV, 
IEI correlation, and c for (a, f3) = (0.2i, 0.2j), where i and j are integers such 
that < i < j < 100. By regarding the CV, IEI correlation, and c values 
with a given pair (a, (3) as a data point, we calculated the Pearson correlation 
coefficient between each pair of the three quantities. The correlation coeffi- 
cient between the CV and IEI correlation, that between the CV and c, and 
that between the IEI correlation and c are equal to 0.775, 0.915, and 0.540, 
respectively. We conclude that these three quantities are strongly correlated 
with each other. 



3 Fitting the Hawkes process to the data 
3.1 Data sets 

We analyze two data sets D\ and Z?2 of face-to-face interaction logs obtained 
from different company offices in Japan. World Signal Center, Hitachi, Ltd., 
Japan collected the data using the Business Microscope system developed 
by Hitachi, Ltd., Japan. For technical details concerning the data collection, 
see [4T|I46|148| . Data sets D\ and Di consist of recordings from 163 individuals 
for 73 days and 211 individuals for 120 days, respectively. In total, Di and 
D2 contain 51879 and 125345 events, respectively. A static network gener- 
ated from data set D2, in which a link is defined to exist between a pair of 
individuals when the two individuals have at least 10 conversation events, is 
depicted in Fig. [5] The network is apparently composed of two communities. 

Each subject wears a name tag containing an infrared module. The mod- 
ules can communicate with each other if they are less than three meters apart. 
The system is configured such that it detects conversations only when two 
subjects, each wearing a module, are facing each other. Each pair of modules 
whose owners are involved in conversations exchanges the owners' IDs every 
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10 seconds. The two individuals arc denned to be involved in a conversation 
event, simply called the event, if their modules exchange the IDs at least 
once in a minute. The time stamps of the events are stored in the name tag 
of each individual and eventually transferred to the central database. The 
module has other types of data such as the list of conversation partners and 
the duration of each event. 

In particular, the list of conversation partners was the main target of study 
in our first analysis of these data [41]. To recapitulate, we investigated the 
predictability of conversation partners in [JT] • We discarded the information 
about the event times and analyzed the order of the partners with whom 
an individual has conversation events. For each individual, we measured the 
degree of determinism (or predictability) of such a partner sequence using the 
mutual information. We found that the partner sequences of most individ- 
uals have predictable components. A main part of the predictability comes 
from the fact that the same partner tends to be selected repeatedly once 
the partner appears in the sequence. Nevertheless, partner sequences have 
residue predictable components even if we remove the effect of such bursts. 
We related the degree of predictability to the position of individuals in social 
networks. Individuals having a high local clustering coefficient and strong 
links (i.e., links with many conversation events), presumably confined in a 
network community, tend to have small predictability. In contrast, those pre- 
sumably connecting different communities tend to have large predictability. 

In our second paper [42], we quantified the importance of event. By ex- 
tending the previous definition [23] , we defined it as the amount of the new 
information that an event carries about others to the two nodes involved in 
the event. The novelty of the information is defined as the update in the 
latest starting time of a temporal path that reaches a node involved in the 
event (see [H][22][23] for the definition of temporal path). We found that the 
importance of event is distributed according to a heterogeneous distribution. 
In particular, events on the same link occurring at different instants can have 
very different values of the importance. We also verified that our definition 
of the importance captures the role of events in connecting temporal paths. 
In particular, only a small fraction of high importance events is necessary 
and sufficient for connecting nodes along efficient temporal paths. The im- 
portance of event is different from but approximated by the weight of the 
link on which the event occurs and the last IEI between the two nodes. The 
heterogeneity in the importance of event stems from the heterogeneity in 
the IEIs, not from the heterogeneous degree distribution of the aggregated 
network. In fact, we found that the results arc similar for artificial temporal 
networks with a heterogeneous IEI distribution created on the regular ran- 
dom graph in which all the nodes have an identical degree. A relatively small 
fraction of high importance events connects nodes along efficient temporal 
paths on the artificial temporal networks. 



Self-exciting point process modeling 



9 



3.2 Results of fitting 

For the entire sequence of event times obtained for each individual, we carry 
out the ML estimation of the parameters of the Hawkes process with the 
exponential memory kernel. It should be noted that we use the information 
about event times and not the duration of events or the partners' IDs. We 
slightly modify the ML method developed in [33J (see Appendix 2 for details). 

The modification is concerned with the treatment of the data during the 
night. Our data are nonstationary owing to the circadian and weekly rhythms. 
Therefore, direct application of the Hawkes process, which is a stationary 
point process, would be invalid. In the previous literature in which different 
models are investigated, these rhythms are explicitly modeled |9l26j or treated 
by dynamically changing the time scale according to the event rate [19]. In 
contrast, we omit the night part of the data from the analysis because our 
data are collected in company offices and therefore there is no event from late 
in the night through early in the morning. 

In both data sets D\ and D2, there is nobody in the office between four 
and six in the morning. Accordingly, we can partition the data into workdays 
without ambiguity. For each individual, we discard the workdays that contain 
less than 40 conversation events. We call a workday containing at least 40 
events the valid day. Then, we define the first event in each valid day as 
trigger event and set t = 0. The following events on the same valid day are 
interpreted to be generated from the Hawkes process. The time of the last 
event denoted by t\ as t (denoted by tffr in Appendix 2) is defined to be the end 
time of the valid day; it is necessary to specify ii as t to apply the ML method 
(Appendix 2). The value of ii ast depends on individuals even on the same day. 
The individual may stay in the office for a considerable amount of time after 
t = Uast before leaving the office. This implies that the individual does not 
have conversations with others remaining in the office between t ~ ii ast and 
the time when the individual leaves the office. If this is the case, the fact that 
this individual does not have events for t > ti ast may affect the ML estimators. 
Nevertheless, we neglect this point. Finally, we obtain the likelihood of the 
series of events for an individual by multiplying the likelihood for all the valid 
days. 

We apply the ML method to the individuals that possess at least 300 
valid IEIs (i.e., IEIs derived from the valid days) during the entire period. 
This thresholding leaves 63 individuals in D\ and 148 individuals in 7?2- We 
also exclude one individual in D\ because the ML method does not converge 
for this individual. 

The survivor function of the IEI is compared between the data and the 
estimated Hawkes process in Fig. [7] The comparison is made for an individual 
in Di (Fig. [7|a)) and an individual in D2 (Fig. [T^b)). We calculated the 
IEI distribution for the estimated model using the theoretical method [13] 
(Appendix 1). The agreement between the IEI distributions of the data and 
the estimated model is excellent. 
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To assess the quality of the fit at a population level, we compare three 
statistics of the IEI between the data and model for different individuals. 
The relationship between the mean IEI obtained from the data and that 
obtained from the estimated model, i.e., 1/A = (1 — a/ j3) /is, is shown in 
Fig. HJa). For different individuals in both data sets, the mean IEI is close 
between the data and the model. The Pearson correlation coefficient between 
the data and model are equal to 0.993 and 0.986 for D\ and D2, respectively. 
However, the Hawkcs process slightly underestimates the mean IEI. 

The CV values for the data and the estimated model are compared in 
Fig. [8jb). We calculated the CV values for estimated model on the basis of 
2 x 10 5 events that we obtained by simulating the Hawkes process with the 
ML estimators a, (3, and v. Although the CV can be theoretically calculated 
using the ML estimators (Appendix 1), we avoided doing so because the 
theoretical method is computationally too costly to be applied to all the 
individuals. Roughly speaking, the CV values obtained from the model are 
close to those of the data. The Pearson correlation coefficient between the 
data and model are equal to 0.832 and 0.936 for D\ and D2, respectively. 

The IEI correlation of the data and that for the estimated model are 
compared in Fig. [HJc). We calculated the IEI correlation for the estimated 
model by direct numerical simulations, as in the case of the CV. Figure [5Jc) 
indicates that the Hawkes process does not reproduce the IEI correlation for 
most individuals. The IEI correlation for the estimated model is distributed 
in a much narrower range than that of the data. This is consistent with 
the finding that the CV and the IEI correlation are positively correlated in 
the Hawkes process (Sec. 12. 2| ). Because most individuals have the CV values 
larger than unity (Fig. EJa)), the estimate of the IEI correlation obtained by 
the model tends to be positive regardless of the estimated values of a, /?, 
and v. Figure \E[c) suggests that the Hawkes process with the exponential 
memory kernel is incapable of approximating the real data in terms of the 
IEI correlation. 



4 Discussion 

We analyzed properties of the IEI generated by the Hawkes process with 
an exponential memory kernel and then fitted the model to the face-to-face 
interaction logs obtained from company offices. The model successfully re- 
produced the data in terms of the IEI distribution. However, the model does 
not explain the behavior of the IEI correlation in the data. 

This limitation may be because the effect of self-excitation is too strong in 
the Hawkes process; the event rate can be very large after a burst of events. 
To examine this issue, we carry out additional numerical simulations using 
a modified Hawkes model. We modify the model such that after each event 
that would increase the event rate by cf>(0) in the original Hawkes process, 
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we reset the event rate to the basal value v with probability p. The original 
Hawkes process corresponds to p = 0. The CV and IEI correlation for p = 0.1 
and various values of a and j3 are shown in Figs.[9Ja) and^b), respectively. 
The values of the CV and IEI correlation for p = 0.1 are much smaller than 
those for p = (Figs. Ufa) and|4jb)). This is because a burst, which increases 
the CV and IEI correlation in the Hawkes model, is forced to terminate 
with probability p after each event in the modified model. The CV and IEI 
correlation values for (a, (3) = (0.2i, 0.2j), where < i < j < 100 are plotted 
in Fig. |U[c). For comparison, the corresponding results for p = on the 
basis of the data used in Figs.|4|a) andUJb) are also shown in the figure. The 
introduction of p > does not decorrelate the CV and IEI correlation. In fact, 
the Pearson correlation coefficient between the CV and the IEI correlation is 
equal to 0.936; it is even larger than the case of p — (Sec. I2.2[) . To explain 
the behavior of the IEI correlation in the present data, we need different 
models. It seems that the IEI correlation has not been discussed in the context 
of social interaction data, with a notable exception [20]. We are interested 
in the capabilities of alternative models [10l[25j[26] in reproducing the IEI 
correlation in the data. 

In the present study, we used the exponential memory kernel because it 
is analytically tractable and contains only three parameters. The original 
Hawkes process with other memory kernels has also been applied to data 
[HG0]. The ML method is available also for this case (30]. Nevertheless, we 
suspect that self-excitation inherent in the Hawkes process induces both high 
CV and positive IEI correlation for a variety of memory kernels. Therefore, 
the use of different memory kernel may not improve the fit of the Hawkes 
process to our data in terms of the IEI correlation. 

Two-state models 20, 25 , 26 , in which events are produced at high and 
low rates in the excited and normal states, respectively, are also self-exciting. 
These models may be more realistic for social data than the standard Hawkes 
process used in this work in the sense that humans may not distinguish 
many different levels of self-excitation as is assumed in the Hawkes process. 
On the other hand, the Hawkes process with the exponential memory ker- 
nel is simpler than these models such that the ML methods are available 
and the parameters have simple physical meanings. Although the model by 
Malmgren and colleagues allows for the ML method [26] , the method is quite 
complicated and contains many parameters. It may be desirable to develop 
two-state models that are simple and allow for statistical methods. Alterna- 
tively, it may be desirable to modify the Hawkes process to account for the 
behavior of the IEI correlation in the real data. 

We lack methods to compare the goodness of fit of different models, except 
that it is straightforward to test the validity of a model against the Poisson 
process (but see [26]). We need develop goodness of fit tests to compare the 
performance of models proposed in different papers. 
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Appendix 1: IEI distribution of the Hawkes process 

In this section, we explain the derivation of the IEI distribution of the Hawkes 
process shown in [13] . Also see [45] for introduction to mathematical treat- 
ments of the Hawkes and related processes. 

Consider a trigger event at t = and the inhomogeneous Poisson process 
with rate function <fi(t), i.e., the point process directly induced by the trigger 
event. The probability generating functional (PGFL) for this inhomogeneous 
Poisson process, denoted by H, is given by 

H(z(.))=E\llz(U) 



= exp|^ [z(t)-l]<f>(t)dtj, (8) 

where z(t) is a carrying function, and U is the time of the ith event. We define 
to = 0. 

The events at t = U may induce further events. On the basis of Eq. the 
PGFL for the inhomogeneous Poisson process including all the descendant 
events induced by a trigger event at t = 0, denoted by F, is given through 
the following recursive relation: 

F («(•)) = z(0) cxp { ^ [F (z t (-)) - 1] cf>(t)dt\ , (9) 



where z t (t') = z(t' + t) is the time translation. On the right-hand side of 
Eq. (0), z(0) accounts for the trigger event at t = 0, and F (z t (-)) accounts 
for the fact that an event triggered at time t initiates an inhomogeneous 
Poisson process with rate <p(t) on top of the other inhomogeneous Poisson 
processes going on. 

We obtain the PGFL for the entire Hawkes process, denoted by G, by 
combining Eq. © and the PDFL of the homogeneous Poisson process with 
rate v as follows: 

G (z(-)) = exp |y°° v [F (z t (-)) - 1] dfc j • (10) 

We set z(t) — z for t s < t < t s + A and z(t) = 1 otherwise. Then, 
ir(t s ,A, z) = F(z(-)) is the probability generating function (PGF) for the 
number of events in [i s , t s + A], with the carrying variable z, and 

n(t s -t,A,z) = F(z t (-)) (11) 



is the PGF for the number of events in [t s — t,t s — t + A]. Equation © is 
reduced to 
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fexp { J^ +A [n{t s - t, A, z) - 1] 0(t)di} , t B > 0, 
7r(i s , Ai)= 2 cxp {/ ts+zl [ir(t s - t, A, ~z) - 1] <t>{t)dt) , -A<t s <0, 
[l, t s <~A. 

(12) 

By setting t s = and combining Eqs. (fT0|) and (fTl"]) - wc obtain the PGF for 
the number of events in [0, A] as 



[7r(-i, A I) - 1] eft j 



Q 4 (S) = G(z(-))=c: 
In particular, 

7r(t S) A)=7r(t s ,A0) (14) 

is the probability that there is no event in [t s ,t s + A] for a cluster of events 
originating at t = 0. Using Eq. (fl"2|) . we obtain 



exp { J * 5+zi [7f(t s - t, 4) - 1] 0(t)di} , i s > 0, 
Tv(t s ,A) = { o, -^<i s <0, (15) 

1, f. < -A 



By setting 2 = in Eq. (|13| and using Eq. (jl5j). we obtain the survivor 
function of the forward recurrence time, i.e., time to the next event from 
arbitrary t, as follows: 

Qa{0) = Pr(forward recurrence time > A) = exp | — vA — v J [1 — n(t, A)] dt^ , 

(16) 

where Pr denotes probability. Qa(0) is the probability that the Hawkes pro- 
cess does not have any event in [0, A]. 

Finally, the distribution of the interevent interval r is given in the form of 
survivor function as 

Pr(T >,, = -«> /X, (1T , 

where the stationary event rate A is given by Eq. ([5]). 

In the numerical simulations, wc adopted the Simpson's rule for calculating 
integrals in Eqs. (fT5|) and (|16|) . and solved Eq. ([T5]) by iteration. 

We remark that integration of Eq. (|T7|) by part leads to 

<r> = ^ (18) 

and 
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2 r°° 

(r 2 ) = = / Q A (0)(t)dt. (19) 
A Jo 

Equations (|T8|) and (|T9|) can serve to calculate the CV. However, we did 
not use them and obtained the CV by direct numerical simulations because 
calculating the CV via Eqs. (fT8j) and ([T9| is time consuming. 



Appendix 2: ML method for the Hawkes process 

In this section, we explain a slightly modified version of the ML method for 
the Hawkes process with the exponential memory kernel originally proposed 
in [33]. 

We let the event times be < t\ < £2 < • • • < ijv- Different from the 
usual assumption of the continuous-time point process, we allow multiple 
events to occur at the same time (i.e., U = U + i). Such simultaneous events 
actually occur in our data because of the finite time resolution of one minute. 
Simultaneous events do not oppose to the application of the ML method 
explained in the following. 

For the exponential memory kernel given by Eq. ([3]) , the event rate at time 
t is given by 

\{t) = u + a J2 e"^*-^, (20) 
i=i 

where J m ax(i) is the index of the last event before time t. 

The likelihood of the event sequence during the time period [0, £jv], denoted 
by L(ti, ...,t N ) is given by 

/ ftN \ N 

L(t u ...,t N ) =exp (- jf X(t)dtj J] A(ti). (21) 



By substituting Eq. ([20]) in Eq. (|21[) . we obtain the log likelihood for the 
original Hawkes process as follows [33] : 

N N 

logLih, ...,t N ) = -vt N +Y, - R (e-^-<«) - l)+^log(i/+aA(i)), (22) 

i=i " t=i 

where 

A(i) = J2 e'^-^. (23) 

l<j<i<N 

Exactly speaking, the point process for an individual for one workday 
begins when the individual has arrived in the office. Because we do not know 
when the point process begins, we assume that the first event of each day is 
a given trigger event. In other words, we set t% = and modify Eq. (j2"2"j) as 
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N N 

logL(h, ...,t N ) = -vt N +J2 - Q (f - l)+^log(,+ a A( ! )). (24) 

i=l P i—2 

For each individual, we use the days that have at least 40 events. We index 
such a valid day as d = 1,2,..., d max - We denote the event times of valid day 
d by = tf < . . . < tff , where is the number of events in valid day d. 
The log likelihood of the entire sequence is given by the summation of the 
log likelihood over all the valid days. 

The partial derivatives of the log likelihood with respect to a, /?, and v 
are originally derived in [33]. In the present case, they read 

(25) 



SlogL 

d/3 



e -^N d -tl) 



= 1 



where 



and 



Ad (i)= y, e ~ Pit '~ tf) ( 28 ) 

l<j<i<N d 



BS)= E (ti-tj)e-^-V. (29) 

l<j<i<N d 

We obtain the ML estimates by setting the left-hand sides of Eqs. (|2"5j) , (l26l) . 
and (|27|) to 0. 

We carried out the gradient descent method to estimate a, /3, and v for 
each individual. We repeat the substitution 

+ (30) 

/»W> + ,^, (3D 

„<-„ + { *£li, (32) 

where we set (5 = 10 -2 . For one individual in D2, the ML method does not 
converge with 6 = 10~ 2 . Because it converges with 6 = 10~ 3 , we used this 
value for this particular individual. 
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Because the likelihood may have multiple local maxima, we started the 
gradient descent method with two different initial conditions, i.e., (a, (3, v) = 
(0.6,1.2,0.6) and (12,24,12) [hr" 1 ]. We found that the final results corre- 
sponding to the two initial conditions were identical for each individual. 

For the ML method, the Hessian of the log likelihood can be explicitly given 
and used in combination with the Newton method |33j . However, we found 
that the Newton method does not converge for many individuals compared 
to the simple gradient descent described above. Therefore, we did not use the 
Newton method. 

Because a,/3,u > and a < (3 are needed for the Hawkes process to be 
well defined, we forced the parameter values to satisfy these conditions. In 
each update step, if the updated a becomes less than 10 -6 , we set a = 10 -6 . 
Similarly, if a < (3 is violated, we set (3 = a + 10~ 6 . If v < 10~ 6 , we set 
v = 10~ 6 . 

The temporal resolution of our data is a minute. We set the unit time for 
the ML method to an hour such that our data has a resolution of 1/60 on 
this timescale. The data would be too discrete for the ML method to work 
without serious deviation if we set the unit time for the ML method to a 
minute. We verified that the results little change when we make the time 
unit larger than one hour. 
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Fig. 1 Survivor functions of the IEI (i.e., probability that the IEI is larger than t) for 
the conversation sequences of two individuals. For each of Di and D2, the individual 
with the largest number of events is selected. The selected individuals in D\ and D2 
have 2397 and 2886 events, respectively. 
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Fig. 2 Conditional mean IEI defined by Eq. ([T]). (a) Conversation event sequences 
(41 J . (b) Email logs [6]. (c) Purchase of sexual escorts [36] . (d) Histogram of the IEI 
for the data shown in (c). 
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Fig. 3 Example time course of event rate A(t) and the corresponding event sequence. 
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(a) 




Fig. 4 Statistics of the IEI obtained from the Hawkes process, (a) CV, (b) IEI 
correlation, and (c) mean cluster size c for various values of a and /3. 
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Fig. 5 Social network constructed from data set D-2.. A link indicates that the pair 
of individuals has at least 10 conversation events during the recording period. The 
network has 211 nodes and 2063 links. 
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Fig. 6 Survivor functions of the IEI for the Hawkes process with different values of 
a and /3. 
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Fig. 7 Survivor functions of the IEI for two individuals, (a) Results for an individual 
in Di , who has 1694 valid IEIs during the recording period. The ML estimators are 
given by a — 4.91, j3 = 7.89, and v = 2.18. (b) Results for an individual in D±, 
who has 1765 valid IEIs. The ML estimators are given by a = 2.45, /3 = 3.86, and 
v = 2.77. 
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Fig. 8 Comparison between the data and the estimated model, (a) mean IEI, (b) 
CV, and (c) IEI correlation. Each data point corresponds to one valid individual. The 
mean IEI, CV, and IEI correlation for the data are calculated on the basis of the days 
containing at least 40 events. 
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Fig. 9 Results for the modified Hawkes process with v = 1. (a) CV with p = 0.1. 
(b) IEI correlation with p = 0.1. (c) Relationship between the CV and IEI correlation 
with p = and p = 0.1. 



